AITopics | length prediction

Collaborating Authors

length prediction

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline

Neural Information Processing SystemsFeb-17-2026, 05:01:31 GMT

Large language models (LLMs) have revolutionized the field of AI, demonstrating unprecedented capacity across various tasks.

large language model, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Europe > France (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline

Neural Information Processing SystemsOct-9-2025, 07:57:38 GMT

Large language models (LLMs) have revolutionized the field of AI, demonstrating unprecedented capacity across various tasks.

large language model, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Europe > France (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

DiffER: Categorical Diffusion for Chemical Retrosynthesis

Current, Sean, Chen, Ziqi, Adu-Ampratwum, Daniel, Ning, Xia, Parthasarathy, Srinivasan

arXiv.org Artificial IntelligenceJun-4-2025

Methods for automatic chemical retrosynthesis have found recent success through the application of models traditionally built for natural language processing, primarily through transformer neural networks. These models have demonstrated significant ability to translate between the SMILES encodings of chemical products and reactants, but are constrained as a result of their autoregressive nature. We propose DiffER, an alternative template-free method for retrosynthesis prediction in the form of categorical diffusion, which allows the entire output SMILES sequence to be predicted in unison. We construct an ensemble of diffusion models which achieves state-of-the-art performance for top-1 accuracy and competitive performance for top-3, top-5, and top-10 accuracy among template-free methods. We prove that DiffER is a strong baseline for a new class of template-free model, capable of learning a variety of synthetic techniques used in laboratory settings and outperforming a variety of other template-free methods on top-k accuracy metrics. By constructing an ensemble of categorical diffusion models with a novel length prediction component with variance, our method is able to approximately sample from the posterior distribution of reactants, producing results with strong metrics of confidence and likelihood. Furthermore, our analyses demonstrate that accurate prediction of the SMILES sequence length is key to further boosting the performance of categorical diffusion models.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2505.23721

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Industry:

Materials > Chemicals (1.00)
Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

ECCOS: Efficient Capability and Cost Coordinated Scheduling for Multi-LLM Serving

Mei, Kai, Xu, Wujiang, Lin, Shuhang, Zhang, Yongfeng

arXiv.org Artificial IntelligenceMar-7-2025

As large language models (LLMs) are increasingly deployed as service endpoints in systems, the surge in query volume creates significant scheduling challenges. Existing scheduling frameworks mainly target at latency optimization while neglecting the capability of LLMs to serve different level of queries, which could lead to computational resource waste. This paper addresses this challenge by proposing a capability-cost coordinated scheduling framework, ECCOS, for multi-LLM serving, which explicitly constrains response quality and workload to optimize LLM inference cost. Specifically, it introduces the two-stage scheduling by designing a multi-objective predictor and a constrained optimizer. The predictor estimates both model capabilities and computational costs through training-based and retrieval-based approaches, while the optimizer determines cost-optimal assignments under quality and workload constraints. It also introduces QAServe, a dataset collected for sample-wise response quality and costs by zero-shot prompting different LLMs on knowledge QA and mathematical reasoning. Extensive experiments demonstrate that ECCOS improves success rates by 6.30% while reducing costs by 10.15% compared to existing methods, consuming less than 0.5% of LLM response time. The code is available at: https://github.com/agiresearch/ECCOS.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2502.20576

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > United States > California > San Diego County > San Diego (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Don't Stop Me Now: Embedding Based Scheduling for LLMs

Shahout, Rana, Malach, Eran, Liu, Chunwei, Jiang, Weifan, Yu, Minlan, Mitzenmacher, Michael

arXiv.org Artificial IntelligenceOct-1-2024

Efficient scheduling is crucial for interactive Large Language Model (LLM) applications, where low request completion time directly impacts user engagement. Size-based scheduling algorithms like Shortest Remaining Process Time (SRPT) aim to reduce average request completion time by leveraging known or estimated request sizes and allowing preemption by incoming jobs with shorter service times. However, two main challenges arise when applying size-based scheduling to LLM systems. First, accurately predicting output lengths from prompts is challenging and often resource-intensive, making it impractical for many systems. As a result, the state-of-the-art LLM systems default to first-come, first-served scheduling, which can lead to head-of-line blocking and reduced system efficiency. Second, preemption introduces extra memory overhead to LLM systems as they must maintain intermediate states for unfinished (preempted) requests. In this paper, we propose TRAIL, a method to obtain output predictions from the target LLM itself. After generating each output token, we recycle the embedding of its internal structure as input for a lightweight classifier that predicts the remaining length for each running request. Using these predictions, we propose a prediction-based SRPT variant with limited preemption designed to account for memory overhead in LLM systems. This variant allows preemption early in request execution when memory consumption is low but restricts preemption as requests approach completion to optimize resource utilization. On the theoretical side, we derive a closed-form formula for this SRPT variant in an M/G/1 queue model, which demonstrates its potential value. In our system, we implement this preemption policy alongside our embedding-based prediction method.

prediction, preemption, scheduling, (14 more...)

arXiv.org Artificial Intelligence

2410.01035

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Improving Citation Text Generation: Overcoming Limitations in Length Control

Mandal, Biswadip, Li, Xiangci, Ouyang, Jessica

arXiv.org Artificial IntelligenceJul-20-2024

A key challenge in citation text generation is that the length of generated text often differs from the length of the target, lowering the quality of the generation. While prior works have investigated length-controlled generation, their effectiveness depends on knowing the appropriate generation length. In this work, we present an in-depth study of the limitations of predicting scientific citation text length and explore the use of heuristic estimates of desired length.

citation length, computational linguistic, length prediction, (11 more...)

arXiv.org Artificial Intelligence

2407.14997

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Texas > Dallas County > Richardson (0.04)
Europe > Belgium (0.04)
(2 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning

Ye, Jiasheng, Zheng, Zaixiang, Bao, Yu, Qian, Lihua, Gu, Quanquan

arXiv.org Artificial IntelligenceAug-25-2023

The recent surge of generative AI has been fueled by the generative power of diffusion probabilistic models and the scalable capabilities of large language models. Despite their potential, it remains elusive whether diffusion language models can solve general language tasks comparable to their autoregressive counterparts. This paper demonstrates that scaling diffusion models w.r.t. data, sizes, and tasks can effectively make them strong language learners. We build competent diffusion language models at scale by first acquiring knowledge from massive data via masked language modeling pretraining thanks to their intrinsic connections. We then reprogram pretrained masked language models into diffusion language models via diffusive adaptation, wherein task-specific finetuning and instruction finetuning are explored to unlock their versatility in solving general language tasks. Experiments show that scaling diffusion language models consistently improves performance across downstream language tasks. We further discover that instruction finetuning can elicit zero-shot and few-shot in-context learning abilities that help tackle many unseen tasks by following natural language instructions, and show promise in advanced and challenging abilities such as reasoning.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2308.12219

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > Arizona > Maricopa County > Scottsdale (0.04)
(3 more...)

Genre: Research Report > New Finding (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Add feedback

Improving Autoregressive NLP Tasks via Modular Linearized Attention

Agostinelli, Victor, Chen, Lizhong

arXiv.org Artificial IntelligenceJun-24-2023

Various natural language processing (NLP) tasks necessitate models that are efficient and small based on their ultimate application at the edge or other resource-constrained environment. While prior research has reduced the size of these models, increasing computational efficiency without considerable performance impacts remains difficult, especially for autoregressive tasks. This paper proposes modular linearized attention (MLA), which combines multiple efficient attention mechanisms, including cosFormer [36], to maximize inference quality while achieving notable speedups.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2304.08453

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > United States > Oregon (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(2 more...)

Genre: Research Report (0.52)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.68)

Add feedback

Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline

Zheng, Zangwei, Ren, Xiaozhe, Xue, Fuzhao, Luo, Yang, Jiang, Xin, You, Yang

arXiv.org Artificial IntelligenceMay-28-2023

Large language models (LLMs) have revolutionized the field of AI, demonstrating unprecedented capacity across various tasks. However, the inference process for LLMs comes with significant computational costs. In this paper, we propose an efficient LLM inference pipeline that harnesses the power of LLMs. Our approach begins by tapping into the potential of LLMs to accurately perceive and predict the response length with minimal overhead. By leveraging this information, we introduce an efficient sequence scheduling technique that groups queries with similar response lengths into micro-batches. We evaluate our approach on real-world instruction datasets using the LLaMA-based model, and our results demonstrate an impressive 86% improvement in inference throughput without compromising effectiveness. Notably, our method is orthogonal to other inference acceleration techniques, making it a valuable addition to many existing toolkits (e.g., FlashAttention, Quantization) for LLM inference.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2305.13144

Country:

Asia > Singapore (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Europe > France (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Focused Study on Sequence Length for Dialogue Summarization

Wang, Bin, Zhang, Chen, Wei, Chengwei, Li, Haizhou

arXiv.org Artificial IntelligenceOct-26-2022

Output length is critical to dialogue summarization systems. The dialogue summary length is determined by multiple factors, including dialogue complexity, summary objective, and personal preferences. In this work, we approach dialogue summary length from three perspectives. First, we analyze the length differences between existing models' outputs and the corresponding human references and find that summarization models tend to produce more verbose summaries due to their pretraining objectives. Second, we identify salient features for summary length prediction by comparing different model settings. Third, we experiment with a length-aware summarizer and show notable improvement on existing models if summary length can be well incorporated. Analysis and experiments are conducted on popular DialogSum and SAMSum datasets to validate our findings.

machine learning, natural language, summary length, (18 more...)

arXiv.org Artificial Intelligence

2209.1191

Country:

North America > United States > California (0.14)
North America > Dominican Republic (0.05)
Oceania > Australia > Victoria > Melbourne (0.04)
(5 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback